366 research outputs found

    Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    Full text link
    Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on information recovery metrics. Our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it absolutely superior. Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters

    Post-processing partitions to identify domains of modularity optimization

    Full text link
    We introduce the Convex Hull of Admissible Modularity Partitions (CHAMP) algorithm to prune and prioritize different network community structures identified across multiple runs of possibly various computational heuristics. Given a set of partitions, CHAMP identifies the domain of modularity optimization for each partition ---i.e., the parameter-space domain where it has the largest modularity relative to the input set---discarding partitions with empty domains to obtain the subset of partitions that are "admissible" candidate community structures that remain potentially optimal over indicated parameter domains. Importantly, CHAMP can be used for multi-dimensional parameter spaces, such as those for multilayer networks where one includes a resolution parameter and interlayer coupling. Using the results from CHAMP, a user can more appropriately select robust community structures by observing the sizes of domains of optimization and the pairwise comparisons between partitions in the admissible subset. We demonstrate the utility of CHAMP with several example networks. In these examples, CHAMP focuses attention onto pruned subsets of admissible partitions that are 20-to-1785 times smaller than the sets of unique partitions obtained by community detection heuristics that were input into CHAMP.Comment: http://www.mdpi.com/1999-4893/10/3/9

    Learning on Graphs: Supervised and Unsupervised Methods

    Get PDF
    We study two methods for learning from network graph data. First, we present a novel method for the unsupervised learning problem of community detection. The proposed method is, to the best of our knowledge, the first enabling users to "zoom in" and "zoom out" on communities with varying levels of focus on network metadata. Second, we review Decagon, a system proposed by Zitnik et al. for the supervised learning task of link prediction. On a biomedical benchmark dataset, Decagon achieves state-of-the-art prediction accuracy. This work adds to the network scientist's machine learning toolkit, illustrating its power in a biomedical domain with significant public health impact.Bachelor of Scienc

    Sensory Regulation of C. elegans Male Mate-Searching Behavior

    Get PDF
    SummaryHow do animals integrate internal drives and external environmental cues to coordinate behaviors? We address this question by studying mate-searching behavior in C. elegans. C. elegans males explore their environment in search of mates (hermaphrodites) and will leave food if mating partners are absent [1]. However, when mates and food coincide, male exploratory behavior is suppressed and males are retained on the food source [1]. We show that the drive to explore is stimulated by male-specific neurons in the tail, the ray neurons. Periodic contact with the hermaphrodite detected through ray neurons changes the male's behavior during periods of no contact and prevents the male from leaving the food source. The hermaphrodite signal is conveyed by male-specific interneurons that are postsynaptic to the rays and that send processes to the major integrative center in the head. This study identifies key parts of the neural circuit that regulates a sexual appetitive behavior in C. elegans

    Image Hijacking: Adversarial Images can Control Generative Models at Runtime

    Full text link
    Are foundation models secure from malicious actors? In this work, we focus on the image input to a vision-language model (VLM). We discover image hijacks, adversarial images that control generative models at runtime. We introduce Behavior Matching, a general method for creating image hijacks, and we use it to explore three types of attacks. Specific string attacks generate arbitrary output of the adversary's choosing. Leak context attacks leak information from the context window into the output. Jailbreak attacks circumvent a model's safety training. We study these attacks against LLaVA-2, a state-of-the-art VLM based on CLIP and LLaMA-2, and find that all our attack types have above a 90\% success rate. Moreover, our attacks are automated and require only small image perturbations. These findings raise serious concerns about the security of foundation models. If image hijacks are as difficult to defend against as adversarial examples in CIFAR-10, then it might be many years before a solution is found -- if it even exists.Comment: Code is available at https://github.com/euanong/image-hijack

    Annotation and analysis of a large cuticular protein family with the R&R Consensus in Anopheles gambiae

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The most abundant family of insect cuticular proteins, the CPR family, is recognized by the R&R Consensus, a domain of about 64 amino acids that binds to chitin and is present throughout arthropods. Several species have now been shown to have more than 100 CPR genes, inviting speculation as to the functional importance of this large number and diversity.</p> <p>Results</p> <p>We have identified 156 genes in <it>Anopheles gambiae </it>that code for putative cuticular proteins in this CPR family, over 1% of the total number of predicted genes in this species. Annotation was verified using several criteria including identification of TATA boxes, INRs, and DPEs plus support from proteomic and gene expression analyses. Two previously recognized CPR classes, RR-1 and RR-2, form separate, well-supported clades with the exception of a small set of genes with long branches whose relationships are poorly resolved. Several of these outliers have clear orthologs in other species. Although both clades are under purifying selection, the RR-1 variant of the R&R Consensus is evolving at twice the rate of the RR-2 variant and is structurally more labile. In contrast, the regions flanking the R&R Consensus have diversified in amino-acid composition to a much greater extent in RR-2 genes compared with RR-1 genes. Many genes are found in compact tandem arrays that may include similar or dissimilar genes but always include just one of the two classes. Tandem arrays of RR-2 genes frequently contain subsets of genes coding for highly similar proteins (sequence clusters). Properties of the proteins indicated that each cluster may serve a distinct function in the cuticle.</p> <p>Conclusion</p> <p>The complete annotation of this large gene family provides insight on the mechanisms of gene family evolution and clues about the need for so many CPR genes. These data also should assist annotation of other <it>Anopheles </it>genes.</p

    imitation: Clean Imitation Learning Implementations

    Full text link
    imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse reinforcement learning (IRL) algorithms, three imitation learning algorithms and a preference comparison algorithm. The implementations have been benchmarked against previous results, and automated tests cover 98% of the code. Moreover, the algorithms are implemented in a modular fashion, making it simple to develop novel algorithms in the framework. Our source code, including documentation and examples, is available at https://github.com/HumanCompatibleAI/imitatio
    • …
    corecore